DiGrad: Multi-Task Reinforcement Learning with Shared Actions
نویسندگان
چکیده
Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.
منابع مشابه
Learning Shared Representations for Value Functions in Multi-task Reinforcement Learning
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملReinforcement Learning with Reusing Mechanism of Avoidance Actions and its Application to Learning Whole-Body Motions of Multi-Link Robot∗
In acquiring a motion only from its objective by learning, large cost such as damage from falling over and a large number of trials are required if the motion is a complex one, such as a jumping serve. Reusing the knowledge already learnt is an essential mechanism to learn such motions efficiently, like humans do. In this study, we propose to use a decomposition of action-value functions as a r...
متن کاملArgumentation Accelerated Reinforcement Learning for Cooperative Multi-Agent Systems
Multi-Agent Learning is a complex problem, especially in real-time systems. We address this problem by introducing Argumentation Accelerated Reinforcement Learning (AARL), which provides a methodology for defining heuristics, represented by arguments, and incorporates these heuristics into Reinforcement Learning (RL) by using reward shaping. We define AARL via argumentation and prove that it ca...
متن کاملMultiagent Credit Assignment in a Team of Cooperative Q-Learning Agents with a Parallel Task
Traditionally in many multiagent reinforcement learning researches, qualifying each individual agent’s behavior is responsibility of environment’s critic. However, in most practical cases, critic is not completely aware of effects of all agents’ actions on the team performance. Using agents’ learning history, it is possible to judge the correctness of their actions. To do so, we use team common...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.10463 شماره
صفحات -
تاریخ انتشار 2018